Fuzzy State Aggregation and Off-policy Reinforcement Learning for Stochastic Environments

نویسنده

  • Dean C. Wardell
چکیده

Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the environment it is operating in changes. This ability to learn in an unsupervised manner in a changing environment is applicable in complex domains through the use of function approximation of the domain’s policy. The function approximation presented here is that of fuzzy state aggregation. This article presents the use of fuzzy state aggregation with the current policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF), exceeding the learning rate and performance of the combined fuzzy state aggregation and Q-learning reinforcement learning. Results of testing using the TileWorld domain demonstrate the policy hill climbing performs better than the existing Q-learning implementations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy State Aggregation and Policy Hill Climbing for Stochastic Environments

Received (received date) Revised (revised date) Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the operating environment changes. Additionally, by applying reinforcement learning to multiple cooperative software agents (a multi-agent system) not only allows each individual ag...

متن کامل

Adaptive Critic Based Adaptation of A Fuzzy Policy Manager for A Logistic System

We show that a reinforcement learning method, adaptive critic based approximate dynamic programming, can be used to create fuzzy policy managers for adaptive control of a logistic system. Two different architectures are used for the policy manager, a feed forward neural network, and a fuzzy rule base. For both architectures, policy managers are trained that outperform LP and GA derived fixed po...

متن کامل

Policy Improvement for several Environments Extended Version

In this paper we state a generalized form of the policy improvement algorithm for reinforcement learning. This new algorithm can be used to ...nd stochastic policies that optimize single-agent behavior for several environments and reinforcement functions simultaneously. We ...rst introduce a geometric interpretation of policy improvement, de...ne a framework to apply one policy to several envir...

متن کامل

Policy Improvement for several Environments

In this paper we state a generalized form of the policy improvement algorithm for reinforcement learning. This new algorithm can be used to ...nd stochastic policies that optimize single-agent behavior for several environments and reinforcement functions simultaneously. We ...rst introduce a geometric interpretation of policy improvement, de...ne a framework to apply one policy to several envir...

متن کامل

Action Dependent State Space Abstraction for Hierarchical Learning Systems

To operate effectively in complex environments learning agents have to selectively ignore irrelevant details by forming useful abstractions. In this paper we outline a formulation of abstraction for reinforcement learning approaches to stochastic decision problems by extending one of the recent minimization models, known as ǫ-reduction. The technique presented here extends ǫ-reduction to SMDPs ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006